北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2008, Vol. 31 ›› Issue (3): 33-37.doi: 10.13190/jbupt.200803.33.283

• 论文 • 上一篇    下一篇

一个大规模垃圾短信实时过滤系统

黄文良1,2, 李石坚1, 刘菊新1, 徐从富1   

  1. 1. 浙江大学 计算机学院, 杭州 310027; 2. 中国联通浙江分公司, 杭州310006
  • 收稿日期:2007-11-25 修回日期:1900-01-01 出版日期:2008-06-28 发布日期:2008-06-28
  • 通讯作者: 黄文良

A Large-Scale Online Spam Short Message Filtering System

HUANG Wen-liang1,2, LI Shi-jian1, LIU Ju-xin1, XU Cong-fu1   

  1. 1. College of Computer Science, Zhejiang University, Hangzhou, 310027, China
    2. Zhejiang Branch of China Unicom Corporation Lid, Hangzhou 310006, China
  • Received:2007-11-25 Revised:1900-01-01 Online:2008-06-28 Published:2008-06-28
  • Contact: HUANG Wen-liang

摘要:

在分析现有短信监控系统不足的基础上,结合文本分类技术和行为识别技术,设计了一种垃圾短信监控和过滤系统. 系统综合考虑短信发送行为特征、短信文本内容等特点,并采用实时分类和离线分类相结合的方法进行高效短信过滤. 此外,还设计了一组基于反馈的自学习机制,使分类器具备增量式学习能力. 与传统方法相比,本文方法在过滤效率和准确率两方面均获得大幅度提升.

关键词: 垃圾短信过滤, 统计学习, 文本分类

Abstract:

It’s well known that the spam-short-messages are annoying cell-phone users and mobile service providers everyday. A new spam-short-messages filtering system, combining online filtering with offline classifying, is presented. The system can filter messages efficiently according to the sending behavior characteristics and the messages contents. Additionally, a self-learning mechanism is designed based on its operators’ feedback. It enables the classifiers of the system to improve themselves according to the filtering results. Compared with traditional methods, the presented method has better performance in terms of filtering efficiency and accuracy.

Key words: spam short message filtering, statistical learning, text categorization

中图分类号: